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Summary 



This article demonstrates that flexible and statistically tractable multi-modal diffusion 
1^ models can be attained by transformation of simple well-known diffusion models such as 

■y the Ornstein-Uhlenbeck model, or more generally a Pearson diffusion. The transformed 

'o^ diffusion inherits many properties of the underlying simple diffusion including its mixing 

'""' rates and distributions of first passage times. Likelihood inference and martingale esti- 

mating functions are considered in the case of a discretely observed bimodal diffusion. It 
is further demonstrated that model parameters can be identified and estimated when the 
fsj diffusion is observed with additional measurement error. The new approach is applied to 

^ molecular dynamics data in form of a reaction coordinate of the small Trp-zipper pro- 

• tein, for which the folding and unfolding rates are estimated. The new models provide a 

(^ better fit to this type of protein folding data than previous models because the diffusion 

cn coefficient is state-dependent. 



Key words: diffusion; mean passage time; measurement error; martingale estimating func- 
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1 Introduction 

In this article we propose a new class of stationary stochastic differential equation mod- 
els that have multi-modal invariant distributions. These models are useful for modeling 
dynamical systems that switch randomly between two or more regimes. As an example, 
we consider molecular dynamics data in form of a protein reaction coordinate with two 
regimes corresponding to the folded and unfolded state of the protein, respectively. Essen- 
tially protein folding happens in a high-dimensional space, but a remarkable consequence 
of the energy landscape theory is that folding is essentially a low dimensional process as 
it happens down a folding funnel. It has been suggested that folding can be accurately 
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captured by one or a few suitably chosen protein reaction coordinates (that is, univariate 



characteristics of the protein) with diffusive dynamics along the folding funnel, see Socci, 



Onuchic fc Wolynes (1996) ), Das et al. (2006), and the references therein. Note, however. 



that applications of bimodal diffusions are not limited to the study of molecular dynamics. 
Other applications of bimodal diffusion are as models of the global climate where the two 



regimes could be a cold and a hot climate as in Imkeller & Pavlyukevich (2002), and as 



financial models of e.g. interest rates subject to changes in the underlying financial and 



economic mechanisms as in Ai't-Sahalia (1996). 



Traditionally bimodal diffusion processes have been constructed by a stochastic dif- 
ferential equation with additive noise for a process moving in a double-well potential, i.e. 
a stochastic differential equation of the form 

dYt = -V'(Yt)dt + (T^dBt, (1) 

where {Bf} is a Wiener process and \^ is a potential with two valleys. Under the condition 
that V{y) goes to infinity at the boundaries of the state space and that the function 
h{y) = exp{~2V{y)/a^} is integrable on the state space, {Yt} is ergodic with invariant 
density proportional to h{y). An often studied example is given by the potential V{y) = 
%2(2/2 _ 2) with e > 0, for which the drift is -Ae{y^ - y) = -4:0y{y + l){y - 1). This 
simple potential has wells of the same depths at 1 and -1 and is symmetric around a 
separating potential barrier at 0. The related diffusion is ergodic with invariant density 
proportional to exp{—26{y^ — 2y'^)}. It is easy to generalize this model to models with 
wells at other points that need not be symmetric around the separating potential barrier. 
The double well potential models are the state of the art in the analysis of protein reaction 
coordinates. That is, constant diffusion is usually assumed and used in the estimation 



of the protein folding rates, see e.g. Socci, Onuchic & Wolynes (1996). A more complex 



model of molecular dynamics was presented in Pokern, Stuart & Wiberg (2009) who 
used a partially observed hypoelliptical diffusion to model the dihedral angle in a butane 
molecule. Still this model assumes a constant diffusion coefficient which may conflict with 
that of the data. More recently evidence of non-constant diffusion in protein reaction 



coordinates have been reported in several articles. Best & Hummer (2010) discusses these 
findings and their implications for the assessment of protein folding rates. 

Our new class of bimodal diffusions is obtained by applying particular transforma- 
tions to simple well-known diffusions such as the Ornstein-Uhlenbeck process or a general 



Pearson diffusion; see Forman & S0rensen (2008). This leads to diffusion models with 



nonlinear drift and non-constant diffusion coefficients that are still highly tractable both 
from a statistical and a computational point of view. A major point of this article is 
that many properties of diffusions are preserved by transformation. These include sta- 
tionarity, mixing properties, and distributions of first passage times. Also the eigenvalues 
of the infinitesimal generator of the diffusion are preserved by the transformation, and 
eigenfunctions transform in a straightforward way. This facilitates efficient approximate 
likelihood inference by means of e.g. the explicit martingale estimating functions proposed 



by Kessler & S0rensen (1999 ). In the rare cases where the likelihood function of a diffusion 
is explicitly known, this is also the case for any of its transformations. Similarly to the 
double well potential models, our new diffusion models allow for great flexibility in the 
modeling of the invariant marginal distribution. Thus, the new bimodal diffusion models 
provide a useful extension of the class of bimodal diffusion models. 



Multimodal diffusions c 

An alternative to the stochastic differential equation approach is to model each regime 
separately and to let the shifts between the models be determined by an underlying finite- 
state process such as a hidden Marlcov model or Markov state model. These models are 



widely employed as models of protein folding, see Prinz et al. (2011 ) for a review, although 
it is recognized that the models are inadequate in describing the more gradual transition 
between states which is the de facto behaviour of many proteins. Latent Markov state 



models are also very popular in financial and econometric applications, see Lange fc Rah 



bek (2009 ) for an overview. However, a latent state model is too complex and difficult to 
interpret if what is observed is not two different dynamical systems, but is really the same 
dynamical system that just has the property that it can be in two different regimes. A 
multi-modal diffusion has local attractions points corresponding to its regimes and moves 
between them in a continuous and random way. Apart from the conceptual advantage and 
the simpler model, other advantages of a multi-modal diffusion over two separate models 
is that the stationary marginal density in a succinct way contains important information 
about the regimes: the relative size of the modes reflect the time spent in each mode, and 
the peakedness and broadness of the modes reflect the volatility of the regimes. Moreover, 
explicit formulae for mean passage times allows for precise calculation of the time spent 
in each regime and the probability of switching from one regime to another. 

The article is organized as follows: Section |2] is devoted to the construction of our 
multi-modal diffusion model. We investigate the properties of these model and contrast 
them to other existing bimodal diffusions, the double-well potential models in particular. 
In Section |3] we discuss inference for the new class of bimodal diffusions emphasizing 
approximate likelihood inference based on martingale estimating functions. Inference is 
further discussed when the bimodal diffusion is observed with measurement error. In 
Section |4] we apply a bimodal diffusion model to molecular dynamics data in form of a 
reaction coordinate of the small Trp-zipper protein. Upon adjusting for measurement error 
we arrive at estimated folding rates that are realistic for this kind of protein. Section |5] 
concludes. 



2 Multi-modal diffusions by transformation 

2.1 Multi-modal distributions 

As a starting point we consider a bimodal density /. For instance, / could be the invariant 
density of the double-well potential diffusion (fTl) or the mixture of two unimodal densities 
/i and /2, 

/ = « ■ /i + (1 - a) ■ /2, < a < 1. (2) 

This density will be the marginal invariant density of our bimodal diffusion. As our main 
example, we consider the bimodal normal mixture density a-0(-; /xi, cr^)-|-(l— a)0(-; /i2, C2), 
where the mean parameters /ii and ^2 determine the location of the two modes and the 
variance parameters af and (T2 determine the broadness of each mode. Some instances of 
this density are shown in Figure [l] below. The bimodal normal density fits the protein 
reaction coordinate data in our case study (Section 111) well. In other applications it might 
be more relevant to consider densities /i and /2 that are constrained to a bounded interval 
(this is the case for certain protein reaction coordinates), or that are heavy-tailed or skew 
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(e.g. financial data). 

Note that if tlie mode points of tlie two unimodal densities are not located sufficiently 
far apart, then the mode points of the bimodal density are not identical to the mode 
points of the unimodal densities, or the mixture density might not be bimodal at all. 
This, however, is not a problem when applying the model to data where bimodality is 
manifest. Further, for particular choices of parameters, the bimodality of ^ can easily 
be checked; its mode points can be found numerically. 

An important point of the paper is that multi-modal diffusions with more than two 
regimes can also be constructed by the method presented in this section by simply choosing 
for / a multi-modal density. For instance, a tri-modal diffusion is obtained for / = 
"i/i + 0^2/2 + ^3/35 tti5 C(2, tt3 G (0, 1), ai + a2 + a3 = 1, where /j, z = 1, 2, 3, are unimodal 
probability densities whose modes are sufficiently separated. It is merely to simplify the 
presentation that we describe only the bimodal model in detail. 

2.2 Bimodal diffusions 

In order to model a bimodal (multi-modal) diffusion, we initially consider a stationary 
diffusion of general form, 

dXt = fi{Xt)dt + a{Xt)dBt, (3) 

where {Bt} is a Wiener process, and where we assume that the coefficients are sufficiently 
regular to ensure that a unique weak solution exists for any given initial condition. In 
principle {Xt} could be any diffusion, but we aim to construct bimodal diffusions for which 
statistical inference is relatively easy, so we are interested in cases where the diffusion 
{Xt} is analytically tractable. This is for instance the case if {Xf} is an ergodic Pearson 



diffusion as considered by Forman & S0rensen (2008), see in addition Wong (1964) for an 



early account on these processes. 

Recall that a stationary solution {Xt} to the stochastic differential equation exists if 

pr /"X* pr 

/ s{x)dx = / s{x)dx = 00 and / [s{x)(7^{x)]^^dx < 00. 
Jx# Je Je 

where the state space is {£,r) (—00 < i < r < 00), x* G {i,r) is arbitrary, and s{x) is 
the density of the scale measure 

s{x)=exp(-2J^^^-^^dyY x e {£,r). (4) 

Under these conditions the diffusion {Xt} is ergodic with invariant probability density 



Karlin & Taylor (1981). 



given by tc{x) = ij^{s{y)cr'^{y)} ^dy) ^ ■ {s{x)a'^{x)} ^ see 

Let n denote the cumulative distribution function of the invariant distribution of 
the diffusion {Xt} given by (|3]), then a stationary diffusion with bimodal (multi- modal) 
invariant density can be obtained by transforming {Xt} with the transformation 

T = p-^oU 

where F~^ is the quantile function of the bimodal (multi-modal) density. The cumulative 
distribution function of the bimodal mixture density (pi) is obviously given by F{y) = 
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aFi{y) + (1 — a)F2{y), where Fi and F2 are the cumulative distribution functions of 
the two unimodal distributions, so for practical purposes F~^ can easily be computed. 
The dynamics of the transformed diffusion {Yf} = {r(Xj)} are given by the stochastic 
differential equation, 

dYt = fi^{Yt) + a^{Yt)dBt, (5) 

2/x(r-i y)7r{r-' y) + a\r-^ yy{r-' y) a^r'' y^^r'^ y)f'{y) 






V\y) 



V{yy 



7r(r ^y)o{T ^ y) 

f{y) 



with r ^ = n ^ o F and where we have abbreviated r ^(y) as r ^ y. 
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Figure 1: Densities, drifts and diffusion coefficients of transformed Ornstein-Uhlenbeck 
processes ([T]) with parameters (3 = 1 and invariant bimodal normal mixture densities given 
by f{y) = a4>{y] Ati, ai) + (1 - a)(j){y] /ii, £12) with ai = (J2 = I and -/xi = /i2 = /i varying 
as ji = 1.05 (solid lines), /i = 1.5 (dashed lines), or /i = 2 (dotted lines). The upper panel 
shows symmetric densities (a = 0.5) and the lower panel asymmetric densities {a = 0.75). 



Example: Transformation of the stationary Ornstein-Uhlenbeck process, 

dXt = -uXtdt + V2^dBt (6) 

with invariant distribution equal to the standard normal distribution and autocorrelation 
function p(t) = e"'"*, yields the diffusion 



dYt 



-2z/ 



r-\Y,Mr''Y,) , ^\r-'Y,)nY,) 



fiYt) 



+ 



V{Ytf 



dt + 



fiXt) 



dBt, (7) 
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where r~^ = $^^ o F, and where ip and $ denote the density and the cumulative distri- 
bution function, respectively, of the standard normal distribution. Drifts and diffusion 
coefficients of some transformed Ornstein-Uhlenbeck process are shown in Figure [TJ Note 
that the diffusion coefficients of the transformed Ornstein-Uhlenbeck processes peaks in- 
between the modes, which makes the model a good candidate for protein reaction coor- 
dinates such as the one studied in Section HI • 
Both the transformed Ornstein-Uhlenbeck process ([T]) and the general transformed 
diffusion ([s]) inherits many properties from the simple diffusion {Xf}. The following 
results are straight-forward and thus stated without proof. 



Proposition 2.1 Let f be the bimodal mixture density defined by S), and {Xt} and {Yt} 
the diffusions specified by M) and [^, respectively, then the following hold true. 

1. If the densities /i and /2 distributions have finite k'th order moments fikifi) and 
fJ'k{f2), then so has the bimodal mixture density; fik{f) = tt/^fc(/i) + (1 — Ci)jj,k{f2)- 
In particular, the mean and variance are E{f) = aE{fi) + (1 — a)E{f2) and 

Var(/) = « Var(/i) + (1 - a) Var(/2) + a(l - a){i?(/i) - Eif^)}'. 

2. If {Xt} is stationary and ergodic, then so is {Yt}. 

3. The (a, (3 and p) mixing coefficients of {Xt} and {Yt} satisfy Kt{X) = Kt(Y), where 
K is either of a, (3, or p. In particular if {Xt} is K-mixing (i.e. Kj — )■ as t ^ ooj, 



then so is {Yt}. We refer to Doukhan (1994) for further explanation. 



4. If g is an eigenfunction for the infinitesimal generator of the diffusion {Xt} with 
eigenvalue X, then g o t^^ is an eigenfunction of the generator of {Yt} with the 
same eigenvalue (the infinitesimal generator of the diffusi on M) is the differential 



operator C = fi ■ -^ + \-^, see Karlin & Taylor (1981^. In particular it holds 



that E{g{T ^Yt)\YQ} = e ^^g{T ^ Yq) under mild regularity conditions, see Kessler 



& S0rensen (1999) 



5. If {p{x\x' , A)} denote the transition densities of {Xt} (i.e. p{x\x',/S) is the condi- 
tional density of Xt+i\ given Xt = x'), then the transition densities {q{y\y',A)} of 
{Yt} are given by q{y\y\ A) = p{t-^ y\ r^^ y' , A) ■ f{y)/-n{T-^ y) . 

One of the merits of the bimodal diffusion model is its capability of quantifying the 
shifts between its two regimes. These can be described by the passage times of the 
diffusion. We make use of this to estimate the folding and unfolding rate of the small 
Trp-zipper protein in Section |4j 

The first passage time to b of the general diffusion dsj) is defined by 



Tx{b) = mi{t>0:Xt = b} 
and given that a < b the mean passage time from a to 6 is equal to 

/a ph ph pb 

7r{x)dx ■ / s{x)dx + / tt{x) / s{y)dydx 
Ja Ja Jx 
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where s{x) is the scale density given by ^, see Karhn & Taylor (1981). Similarly, for the 
passage from 6 to a it holds that 

pr pb pb px 

E{Tx{a)\Xo = b) = 2 / 7T{x)dx ■ / s{x)dx + / ^{x) / s{y)dydx. 



The distributions of the passage times of the transformed diffusion are directly related to 
those of the underlying simple diffusion, as obviously Ty^b) = Txij^^ b). Hence we have 
the following result: 

Proposition 2.2 Denote by Tx a first passage time of the diffusion (^ and by Ty a first 
passage time of the transformed diffusion [^, then for any a,b G (£, r): 

{Ty{b)\Yo = a)^ {Tx{r-H)\X, = r-'a), {Ty{a)\Yo = b) ^ {Tx{t~' a)\Xo = r-'b). 

(8) 

Example: The mean passage times of the transformed Ornstein-Uhlenbeck process, ([T]), 
between points a < b are given by 

E{Ty{b)\Yo = a) = — r <l>(x)e"'/2rfx (9) 

and 

E{Ty{a)\Yo = b) = — r {1 - <l'(x)}e"'/'rfx. (10) 

l" Jr-^a 

where $ is the cumulative distribution function of the standard normal distribution. 
In fact the passage times of the Ornstein-Uhlenbeck can be described in greater detail 



as it is possible to find analytical expressions of their densities, see Alili, Patie fc Peder 



sen (2004)) and the references therein. Although involving series expansions and special 



functions, the densities can be computed by means of designated software. This could 
potentially be used to further describe the folding and unfolding of the small Trp-zipper 
protein studied in Section |4| • 

2.3 Comparison to other bimodal diffusion models 

Double well potential models such as ([I]) considered in the introduction are the state-of- 
the-art models for protein reaction coordinates. It should be noted that the double-well 
potential diffusion model is quite flexible and can be made to match any particular bimodal 

2 

density / by specifying the potential as V{x) = — ^ log/(x). One might suspect that a 
diffusion of our type is a transformation of yet another diffusion with constant diffusion 
coefficient that moves in a double-well potential, and that this is the real underlying cause 
for the bimodality. However, we will demonstrate that this is not the case: Suppose a 
diffusion {1^} with bimodal invariant density / is constructed by transformation of a 
simple diffusion {Xt} by the method in the previous section. To any diffusion of the 
general form n3« corresponds a unique transformation to a diffusion {X} with constant 
diffusion coefficient, the so-called Lamperti transformation, Xt = J J:{a{y)}~^dy, see 



e.g. lacus (2008). Since cr'^{y) = (j{t ^ y} ■ t'{t ^y}, it is not difficult to show that 
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when applying the Lamperti transformation of the bimodal transformed diffusion {Yt} = 
{r(Xt)} to this process, then the resulting process is identical to {Xj}, which is completely 
unrelated to the bimodal density /. Our approach to bimodal diffusions is thus genuinely 
different from the double-well potential models. In the new models the bimodality is not 
caused by a double-well potential; rather the bimodality is the product of an interplay 
between the smooth motion given by the drift and the random fluctuations given by the 
state-dependent diffusion coefficient. 



Ait-Sahalia (1996) proposed a general nonlinear diffusion model specified by, 



dXt = (a_iX-i + ao + aiXt + a2X^)dt + ^/p^^VKx^^^^t dBt, (11) 



The parameter constraints under which ( 11 ) is stationary and ergodic are complicated and 
we will not quote them here. The nonlinear diffusion model may display bimodality, but at 
the same time it contains simple unimodal models such as the Ornstein-Uhlenbeck process. 
The nonlinear bimodal diffusion process produces bimodality in the same way as our new 



model does, although in the model (11) most of the action is in the drift. In comparison 



with the other classes of models, the nonlinear diffusion (11) has the disadvantage that in 



its general form it does not admit simple explicit estimating functions, like our transformed 



diffusions do, and that analytical likelihood approximations do not simplify for (11) as 



they do for the double-well potential models, see Section p^ below. Also the nonlinear 



diffusion (11) is not quite as easy to simulate as the other models. 

In regard to the protein folding problem, any of the three classes of models may be 
relevant as different kinds of reaction coordinates display varying patterns of diffusion. 
For instance, our new model seems apt at modeling reaction coordinates that measure 
fluctuations in Cartesian conflguration space for which diffusion is increased in-between 



modes, e.g. as in Best & Hummer (2010), whereas the nonlinear diffusion model is a better 



model for a reaction coordinates in which diffusion decrease monotonically towards the 



folded state, e.g. as in Chahine et al. (2007). 



We end this Section by remarking that bimodality can also be produced solely by 
effects of the random fluctuations determined by a state dependent diffusion coefficient, 
as is clear from the following example. Suppose / is a bimodal density function for which 
the support is the real line. Then the diffusion given by 

dXt = , ^ dBt (12) 



is ergodic with invariant density /, see e.g. Bibby & S0rensen (1997). The bimodality 



is produced by the fact that near the two mode-points the random fluctuations are rel- 
atively small, so that the process tends to stay near these points. When the process is 
far from the mode-points, the random fluctuations are large and will quickly send the 
process to other parts of the state space. Thus the same bimodal invariant distribution 
can be attained for diffusions with very different drift and diffusion coefficients and by 
completely different mechanisms. Our approach combines the mechanisms of the double- 
well potential diffusions (motion in an energy landscape), and of the pure diffusion models 



(12) (state-dependent random fluctuations). 
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3 Statistical inference: 

3.1 Discretely observed diffusion 

In what follows we discuss inference for a discretely observed multi-modal diffusion of 
the type constructed in Section [2j We assume that observations are made at equidistant 
time points tj = iA ioi i = 0, . . . ,N, where A^^ > is the sampling frequency. We 
denote the observations by yo, ■ ■ ■ ,yN- Further, we assume that the distribution of the 
underlying simple diffusion {Xt} is dependent on a pi-dimensional vector of parameters u 
and that the bimodal density /^ is dependent on a p2- dimensional vector of parameters ip. 

Likelihood estimation 

Likelihood estimation is our preferred means of inference, but it is unfortunately rarely the 
case that the likelihood function of a diffusion process is explicitly known. Therefore in- 
ference for diffusion models is usually best carried out by means of approximate likelihood 
methods. A noteworthy exception is the Ornstein-Uhlenbeck process and transformations 
of it, for which the likelihood functions is explicitly known. In particular this is true for 
our bimodal transformation of the Ornstein-Uhlenbeck process for which the likelihood 
function is given by 

^"(*. ") = n 7T3l^^ [ ^1 _,-... j ^(^^r^. 

where cp denotes the standard normal density function. 

Approximate likelihood inference is applicable to many diffusions by using pit-Sahalia 



(2002 )'s analytical approximation to the likelihood function. In a simulation study by 



Jensen & Poulsen (2002) it was demonstrated that this particular likelihood approxima- 
tion is superior to other approximate likelihood methods both in terms of computing time 
and in terms of numerical accuracy. An implementation of this method can be found in 



lacus (2008). The first step in the analytical likelihood approximation is to compute the 
Lamperti transform of the diffusion, see Section [2j Hence, with likelihood inference in 
mind it is desirable to choose an underlying diffusion with a simple Lamperti transform. 
As demonstrated in Section [2] the process obtained when applying the Lamperti transform 
is invariant under transformation. Likelihood inference is therefore a natural choice for 
inference in the double- well potential model ([I]). Since the diffusion coefficient is already 



constant, the first step in Ait-Sahalia (2002 )'s approximation can be skipped 



Martingale estimating functions 

Martingale estimating functions are another way of approximating likelihood inference. 



Specifically, they are unbiased approximations to the score function, see Bibby, Jacobsen 



& S0rensen (2010). For instance the quadratic martingale estimating function, Bibby & 



S0rensen (1995), is a simple, yet often highly efficient, means for obtaining parameter 



estimates in a diffusion model, see S0rensen (2012a). If H]sf{ilj, v) denotes a general esti- 



mating function for the diffusion model (pj), then an estimate (?/', P) is obtained by solving 
the estimating equation Hn^iP, i/) = 0. 

For the Pearson diffusions we can not only find explicit expressions for moments and 
conditional moments (to the extent these exist), but also explicit polynomial eigenfunc- 
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tions for the infinitesimal generator. 



We can therefore find exphcit eigenfunctions for 

This in turn im- 



transformations of Pearson diffusions as shown in Proposition 2.1 

phes that we can find exphcit martingale estimating functions of the type proposed 



by Kessler & S0rensen (1999) for our bimodal diffusions. To be specific, suppose that 
gi{x; u), . . . ,gk{x; v) are eigenfunctions of the infinitesimal generator of the underlying 
simple diffusion with eigenvalues Ai(z/), ■ ■ ■ , Xkiy). Then gi o r~^, . . . ,gk o t~^ are eigen- 
functions of the generator of Y with the same eigenvalues. Therefore, 



GN{i^,i^) 



N k 



u. 



^) {qA 



v.ip 



Vi] v)-e 






-i;^)} 



is a martingale estimating function. In order to obtain an approximation to likelihood 
inference, the (pi-|-p2)-dimensional weight functions Wj should be chosen optimally in 
the sense of [Godambe fc Heyde (1987 ). If the eigenfunctions are polynomials, then the 
optimal weight function can be found explicitly as explained in Forman & S0rensen (2008 ) 



or 



S0rensen (2012b). This is the case for the Pearson diffusions. If a Pearson diffusion is 



transformed by a function r that does not depend on the parameters, then the optimal 
estimating function based on the eigenfunctions of the transformed process is simply equal 
to the estimating function obtained by inserting the data T~^yi in the optimal estimating 



function for the original Pearson diffusion, see e.g. Forman & S0rensen (2008) or Theorem 



1.19 in S0rensen (2012b). This fact was used by|Larsen fc S0rensen (2007) to estimate 



the parameters in a model for exchange rates in a target zone. 

For our bimodal diffusions, the situation is more complicated because the transforma- 
tion Tj,,/, is parameter dependent. In the following we extend the previous results to the 
case of a parameter dependent transformation. To simplify the presentation, we consider 
the case of polynomial eigenfunctions, gj{x; z/) = J2'e=o^'j,£-'^^- "^^^ optimal choice of the 
(P1+P2) X k-matiix of weights w = {wi, . . . ,Wk) is given by 



w*{y;v,i)) = B{y-v,i))V{y]v,i))- 



where the k x A;-matrix V{y;i',ip) is equal to Vh in Theorem 1.19 in S0rensen (2012b) 
(with K{y) = T'^ljy)- The matrix B is given by 



B 



Bh 

Up2Xfc 



5, 



where the pi x A;-matrix By, is as in Theorem 1.19 in S0rensen (2012b) with only the pi 
derivatives w.r.t. v included, Opjxfc is the p2 x fc-matrix with all entries equal to zero (this 
can be thought of as derivatives w.r.t. ip), and the entries Bij{y; f^if)) are given by 



E' 



'i/^ 



E,. 



-1 



,^ {{t,Ji,Ya) 



-'do 



'^u,i> '^A 



Yo = y) 



^Xj{u)A 



{^ulp y) ^ de^ ^^,J. y 



where 9 = {v^ip) (z = 1, . . . ,pi -(- p2, J = 1, • • • , k). Thus the optimal weights are explicit 
apart from the conditional expectation in the expression for B, which can be determined 
by simulation. Another solution is to expand the conditional expectation and the expo- 
nential function in powers of A, see e.g. Lemma 1.10 in S0rensen (2012b[). In this way 
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the following approximation to Bij is obtained 



11 



Km ^> V^) = E4^^ [Lu^^ iir'^'^y-'^ ^;,;) (y) - {r-^yy-'de. r-l,y\ A, 



where the differential operator L^^^ is the infinitesimal generator of the diffusion Y given 

by Ly^^jiy) = fi^{y)f'{y) + \<7'^{y)f"{y). The error made when replacing 5 by -B is of 
order A^, so the loss of efficiency is small if the sampling frequency is sufficiently high. 

Example: The Ornstein-Uhlenbeck process ^ has (among many others) the eigenfunc- 
tions X and x^ — 1, which for the untransformed process gives a quadratic martingale esti- 
mating function. The corresponding eigenvalues are u and 2z/. In this case rT^ = $^^oF^, 
where <l>~^ is the quantile function of the standard normal distribution. Thus rT^ does not 
depend on the parameter u of the underlying Ornstein-Uhlenbeck process. This implies 
some simplifications, e.g. 

Q -1 MM 

<^(^^ y) 

and Bij = 0. For j = 1, the other entries of B are 

E.,^ {d^ r/ Ya I r-i Yo = y) - e-^'^d^ t~' y. 

When /(■; ip) is a bimodal mixture of two normal densities, ip = (a, /xi, ui, fj,2, <72) and 
the derivatives d^F^p are easily found. They take the form 

- ^ ■<f{y;f^i,(^i) 

- ^ ■f{y;f^2,cr2) 

\ -^^^^^^■^{y;f^2,a2) J 



Asymptotics 

Under weak regularity conditions, the maximum likelihood estimator and the estimator 
obtained from the martingale estimating function G^r are consistent and asymptotically 
normal as iV — >^ oo. This follows from standard asymptotic results, e.g. Theorem 1.2 in 
S0rensen (2012b). The asymptotic variance of yN{6]^ — 9q) can be estimated by the 



inverse observed information in the case of the maximum likelihood estimator and by the 
inverse of 

1 ^ 
•^N = j^^ B{yi] On, ^iv)^(2/i; %, ipN)~^B{yi] %, ^JnT, 
1=1 

where '^ denotes transposition (the observed Godambe information), in the case of Gat 
with the optimal weights w*. If the underlying process is a Pearson diffusion, the regularity 
conditions on the parts of G^ that are given by the corresponding optimal martingale 
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estimating function for the Pearson diffusion {T^^l(yf)} can be verified as in 



Forman & 



S0rensen (2008) provided that the Pearson diffusion is ergodic with finite moments of 



order 2k. To treat the contribution to the estimating function from B (or B), additional 
conditions are needed on the smoothness of the functions de^ t~\ and on the moments of 

3.2 Diffusion observed with measurement error 

In biological applications it is often not possible to measure the phenomenon of interest 
without additional measurement error. Therefore we further discuss inference when a 
bimodal (multi-modal) diffusion is observed with error. For instance, the protein reaction 
coordinate considered in Section |4] is much better fitted by a diffusion-plus-error model 
than by a plain diffusion model (even though the error in the protein reaction coordinate 
is not measurement error in its strict sense, but rather reflects local features of the folding 
funnel of the protein) . 

Suppose that instead of observing {li}, we observe z(yf.,ej.), where z is a known 
function, and where {st) is a stationary, normally distributed and possibly correlated 
error process. For simplicity we consider the additive error model 

Zt = Yt + et, dst = -netdt + y^2i^dWt, (13) 

where (e^) is an Ornstein-Uhlenbeck process with marginal A/'(0, 7^)-distribution and ex- 
ponentially decaying autocorrelation function Pe(t) = exp(— /tt). Note that the methods 
considered in this Section can easily be extended to other error processes / other correla- 
tion functions. 

Example: We consider again the bimodal Ornstein-Uhlenbeck model ([t]). The additional 
error complicates the statistical analysis as the observed process is no longer a Markov 



process. Nevertheless the model ( 13 ) is still a tractable model in many regards as we will 
now demonstrate. The stationary marginal density oi {Zt} is the bimodal normal mixture 
density 

f{x) = a ■ (j){x; /ii, cr^ + 7^) + (1 - a) ■ 0(a;; /^s, o"2 + 7^)- 

Hence, consistent (though not efficient) estimates of a, /ii, /X2, erf + 7^, and o"| + 7^ 
can be obtained by maximizing the marginal likelihood function, i.e. by pretending that 
observations are i.i.d.. Upon fixing the above parameters at their estimates, the remaining 
parameters can be estimated by a least squares fit of the theoretical to the empirical 
autocorrelation function. The autocorrelation function of {Zt} is given by 

CoiiZ,, Zs+t) ■.= Pz{t) = (l-/3)py(t)+/3exp(-ft:t), 

where /3 is the proportion of error variance in the marginal distribution, i.e. 

Var(£:i) 7^ 



P 



Var(Zt) a{al + 7^) + (1 - a){ai + 7^) + a(l - a)(pi - /is)^ 



and where py is the autocorrelation function of {Yt}. The autocorrelations priti) are not 
explicitly known, but due to the tractability of the bimodal diffusion they can easily be 
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simulated. Initial values for the fit can be found by using that presumably z^ << k, so 
that pz{t) ~ (1 — /3) + /3exp(— Kt) as t — )■ and pz{t) ~ (1 — P)pY{t) as t — )■ oo. • 

More efficient estimators for the diffusion with error-model can be obtained from approxi- 
mate likelihood methods, which however are all computationally intensive. To name a few, 



we refer to Chib, Pitt & Shephard (2006) for Markov chain Monte Carlo methods and to 



Baltazar-Larios & S0rensen (2010) for estimation based on the EM-algorithm. Inference 



can also be based on the prediction-based estimating functions, see S0rensen (2000) and 



S0rensen (2011), although the joint moments in this context would have to be simulated. 
The tractability of the latent underlying Ornstein-Uhlenbeck processes simplifies all of 
the above mentioned methods (since it can be simulated exactly), but only to a certain 
extent. Efficient estimation is the topic of ongoing research which we will not pursue any 
further in the present article. 



4 Case study: The small Trp-zipper protein 

As an example, we consider molecular dynamics data in form of the L-reaction coordinate 
of the small Trp-zipper protein. The (high-dimensional) dynamics of the protein was 



simulated from the monte carlo algorithm Bottaro et al. (2012) using the PHAISTOS 



software package Boomsma et al. (2013). Subsequently the L-reaction coordinate which 
measures the total distance to a folded reference was computed resulting in a univariate 
time series. For the analysis we consider a subsample of 20,000 observations corresponding 
to a sampling frequency of A~^ =l/nsec. 

The sample path of the L reaction coordinate (Figure |2]) clearly reflects the two con- 
formal states of the small Trp-zipper protein, folded and unfolded. The main interest is to 
estimate the folding and unfolding rates of the protein, and this can be achieved by appli- 
cation of a bimodal diffusion model where the rates of switching between the folded and 
unfolded state correspond to the mean passage times between the modes of the invariant 
bimodal density. 

The histogram (Figure [2]) is well fltted by a bimodal normal mixture density. Further, 
the nonparametric estimates of drift and diffusion coefficient appear similar in shape to 
those of the bimodal transformation of the Ornstein-Uhlenbeck process ([7]). We note 
in particular, that the data display increased volatility in-between the modes. Similar 



patterns have been observed in other protein reaction coordinates, see e.g. Best & Hummer 



(2010). All in all, the bimodal transformation of the Ornstein-Uhlenbeck process is a 
good candidate model for the data. We fit the model making use of the explicit likelihood 
function which yields the following parameter estimates: 

a = 0.27, pi = 25.41, fi2 = 29.02, ai = 1.36, as = 2.59, and i> = 0.071. 

The lower and upper mode of the system are estimated hj I = fli and u = (l2- The 



passage times of the model are given by d9]) and (10). Thus we find the estimated folding 



14 



J. L. FORMAN & M. S0RENSEN 



Sample path 



Histogram 



40 - 



35 



g-30 



25 



20 




5000 10000 15000 20000 
Observation number 



Drift of L— projection 



20 25 30 35 

L— projection 

Diffusion of L— projection 



40 





24 



26 



28 



30 



32 



24 



26 



28 



30 



32 



Figure 2: Sample path, sample histogram, and nonparametric estimates of drift and 
diffusion coefficient for 20.000 observations of the L-reaction coordinate of the small Trp- 
zipper protein. The nonparametric estimators are local linear estimators, see [Fan fc Zhang 



(2003), evaluated at a bandwidth equal to 0.1. 



and unfolding rates of the protein: 



E{Tl\Y^ = u) 



E{Tl\Y, 



f^n 1-^ '^^ 



^ ■Ir-^l.l 



{l-$(a;)}e^'/2rfa; = 28.8. 



T""M2 



^{x)e''^'^dx = 18.5. 



'^ -/r-Vi 



Thus folding and unfolding should occur on average once in about fifteen to thirty nsec. 
These estimates of the folding and unfolding rates are unfortunately not realistic. We 
inspect the uniform residuals (Figure Isl) to check whether the fit of the bimodal transform 
of the Ornstein-Uhlenbeck process is satisfactory. 

The fit to the uniform distribution is reasonable, but not perfect, and the residuals 
appear to be negatively correlated, which might indicate a misfit of the model. To check 
whether it is at all plausible that the data was generated by a Markov process, we further 
applied the nonparametric test of the Markov hypothesis of Ait-Sahalia, Fan & Jiang 
(2010). From this we concluded that it is not likely that the protein data was generated 
by a Markov process (P < 0.0001). Note that since the convergence of the Markov test 
statistic to its asymptotic chi-square distribution is poor, the p-value was computed by 



fully non-parametric bootstrapping as described in Forman (2012). 
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Figure 3: Diagnostics for the fit of the bimodal transformation of the Ornstein-Uhlenbeck 
process to 20.000 observations of the L-projection of the small Trp-zipper protein. The 
left panel shows the QQ-plot of the uniform residuals and the right panel shows a lagplot 
of consecutive uniform residuals. 



We therefore apply the bimodal diffusion model with additional error (13) as a second 



approximation to the folding dynamics. This agrees well with the empirical autocorre- 
lation function (Figure 111) which shows an initial fast drop followed by a long-term slow 
decay. 
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Figure 4: Fit of the diffusion with error model (13) to the sample histogram, and sample 



autocorrelation function of 20.000 observations of the L-reaction coordinate of the small 
Trp-zipper protein. 



Figure |4] shows the fit of the pseudo-likelihood function for the marginal bimodal 
normal mixture distribution together with the least squares fit of the autocorrelation 
function (based on the first 100 lags). The parameter estimates are: 



a = 0.55, /ti = 25.66, fi2 = 30.94, 



CTi = 0.54, (72 = 1.22, z> = 0.0015, 



7 



0.88, and k = 0.43. 



The upper and lower mode of the latent diffusion are estimated by u = fii and I = JI2, 



and the implied folding and unfolding rates of the protein given by ph and ( 10 ) are now 
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estimated by: 



E{T^\yo 



u 



M2 



'{l-<l>{x)}e'''/^dx = 1138, 



fj-i 



E{K\Y, 



'l-K 



M2 



$rx)e^'/2^x = 1320. 



A*! 



Folding and unfolding should thus occur on average once in just over a micro second. 
These estimates are realistic for the protein folding in contrary to the ones implied by the 
plain diffusion model without measurement error. 

As an informal goodness of fit test, we have simulated two trajectories from the plain 
diffusion model and the diffusion plus noise model, respectively, with parameters equal 
to those estimated for the data. The simulated data are shown in Figure |5| It is obvious 
that the model with local error mimics the dynamics of the protein data much better than 
the plain bimodal diffusion model (without error) does. 
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Figure 5: Left panel: A simulated sample path of the bimodal diffusion model ^\. Right 
panel: A simulated sample path of the diffusion plus error model (13). Parameters for 



the simulations were taken from the fit of the two models to the protein data. 



5 Conclusion 



Flexible and statistically tractable multi-modal diffusion models have been developed, 
where the multi-modality is caused by a combination of the effects of an energy landscape 
and of state- dependent random fluctuations. The new diffusion models were obtained by 
transformations of simple well-studied diffusion models. The transformed diffusion was 
shown to inherit many properties of the underlying simple diffusion, including its mixing 
rates and the distributions of first passage times. The eigenf unctions of the infinitesimal 
generator transform in a straightforward way. Particularly tractable models are obtained 
by transformations of the Ornstein-Uhlenbeck model, but also transformations of more 
general Pearson diffusions give tractable multi-model diffusion models. 

Likelihood inference and martingale estimating functions were given and investigated 
in the case of a discretely observed bimodal diffusion. In particular, the theory of mar- 
tingale estimating functions for transformed diffusions was generalized to the case of 
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parameter-dependent transformations. An estimation method was presented for the case 
where the bimodal diffusion is observed with additional correlated measurement error. 

The new approach was applied to molecular dynamics data in form of a reaction 
coordinate of the small Trp-zipper protein, where the stationary distribution has two 
modes corresponding to a folded and an unfolded state. The folding and unfolding rates 
were estimated by the mean passage times between the two mode points. The new model 
provides a better fit to this type of protein folding data than previous models because 
the diffusion coefficient is state-dependent, but the fit is far from perfect. A much more 
satisfactory fit and realistic folding rates were obtained by adding correlated measurement 
error refiecting local features of the folding funnel of the protein. 
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